智能论文笔记

T2FPV: Constructing High-Fidelity First-Person View Datasets From Real-World Pedestrian Trajectories

Benjamin Stoler , Meghdeep Jana , Soonmin Hwang , Jean Oh

分类：计算机视觉 | 机器人

2022-09-22

预测行人运动对于开发在拥挤的环境中相互作用的社会意识的机器人至关重要。虽然社交互动环境的自然视觉观点是一种自然的观点，但轨迹预测中的大多数现有作品纯粹是在自上而下的轨迹空间中进行的。为了支持第一人称视图轨迹预测研究，我们提出了T2FPV，这是一种构建高保真的第一人称视图数据集的方法，给定真实的，自上而下的轨迹数据集；我们在ETH/UCY行人数据集上展示了我们的方法，以生成所有互动行人的以自我为中心的视觉数据。我们报告说，原始的ETH/UCY数据集中使用的鸟眼视图假设，即代理可以用完美的信息观察场景中的每个人，而不会在第一人称视图中保持；在现有作品中通常使用的每个20个磁场场景中，只有一小部分的代理都可以完全看到。我们评估现有的轨迹预测方法在不同的现实感知水平下 - 与自上而下的完美信息设置相比，位移错误增加了356％。为了促进第一人称视图轨迹预测的研究，我们发布了T2FPV-ETH数据集和软件工具。

translated by 谷歌翻译

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

Taehyun Hwang , Min-hwan Oh

分类： (统计)机器学习 | 机器学习

2022-12-27

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves $\tilde{\mathcal{O}}(d \sqrt{H^3 T})$ regret bound where $d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.

translated by 谷歌翻译

Reinforced Clarification Question Generation with Defeasibility Rewards for Disambiguating Social and Moral Situations

Valentina Pyatkin , Jena D. Hwang , Vivek Srikumar , Ximing Lu , Liwei Jiang , Yejin Choi , Chandra Bhagavatula

分类：自然语言处理

2022-12-20

Context is vital for commonsense moral reasoning. "Lying to a friend" is wrong if it is meant to deceive them, but may be morally okay if it is intended to protect them. Such nuanced but salient contextual information can potentially flip the moral judgment of an action. Thus, we present ClarifyDelphi, an interactive system that elicits missing contexts of a moral situation by generating clarification questions such as "Why did you lie to your friend?". Our approach is inspired by the observation that questions whose potential answers lead to diverging moral judgments are the most informative. We learn to generate questions using Reinforcement Learning, by maximizing the divergence between moral judgements of hypothetical answers to a question. Human evaluation shows that our system generates more relevant, informative and defeasible questions compared to other question generation baselines. ClarifyDelphi assists informed moral reasoning processes by seeking additional morally consequential context to disambiguate social and moral situations.

translated by 谷歌翻译

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

Chandra Bhagavatula , Jena D. Hwang , Doug Downey , Ronan Le Bras , Ximing Lu , Keisuke Sakaguchi , Swabha Swayamdipta , Peter West , Yejin Choi

分类：自然语言处理

2022-12-19

Pre-trained language models, despite their rapid advancements powered by scale, still fall short of robust commonsense capabilities. And yet, scale appears to be the winning recipe; after all, the largest models seem to have acquired the largest amount of commonsense capabilities. Or is it? In this paper, we investigate the possibility of a seemingly impossible match: can smaller language models with dismal commonsense capabilities (i.e., GPT-2), ever win over models that are orders of magnitude larger and better (i.e., GPT-3), if the smaller models are powered with novel commonsense distillation algorithms? The key intellectual question we ask here is whether it is possible, if at all, to design a learning algorithm that does not benefit from scale, yet leads to a competitive level of commonsense acquisition. In this work, we study the generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce a novel commonsense distillation framework, I2D2, that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale models as the teacher model by two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-Tomic, that is of the largest and highest quality available to date.

translated by 谷歌翻译

Learning to Detect Semantic Boundaries with Image-level Class Labels

Namyup Kim , Sehyun Hwang , Suha Kwak

分类：计算机视觉 | 人工智能

2022-12-15

This paper presents the first attempt to learn semantic boundary detection using image-level class labels as supervision. Our method starts by estimating coarse areas of object classes through attentions drawn by an image classification network. Since boundaries will locate somewhere between such areas of different classes, our task is formulated as a multiple instance learning (MIL) problem, where pixels on a line segment connecting areas of two different classes are regarded as a bag of boundary candidates. Moreover, we design a new neural network architecture that can learn to estimate semantic boundaries reliably even with uncertain supervision given by the MIL strategy. Our network is used to generate pseudo semantic boundary labels of training images, which are in turn used to train fully supervised models. The final model trained with our pseudo labels achieves an outstanding performance on the SBD dataset, where it is as competitive as some of previous arts trained with stronger supervision.

translated by 谷歌翻译

Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map

Pengxiao Zang , Tristan T. Hormel , Jie Wang , Yukun Guo , Steven T. Bailey , Christina J. Flaxel , David Huang , Thomas S. Hwang , Yali Jia

分类：计算机视觉 | 机器学习

2022-12-13

Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.

translated by 谷歌翻译

MED-SE: Medical Entity Definition-based Sentence Embedding

Hyeonbin Hwang , Haanju Yoo , Yera Choi

分类：机器学习 | 人工智能 | 自然语言处理

2022-12-09

We propose Medical Entity Definition-based Sentence Embedding (MED-SE), a novel unsupervised contrastive learning framework designed for clinical texts, which exploits the definitions of medical entities. To this end, we conduct an extensive analysis of multiple sentence embedding techniques in clinical semantic textual similarity (STS) settings. In the entity-centric setting that we have designed, MED-SE achieves significantly better performance, while the existing unsupervised methods including SimCSE show degraded performance. Our experiments elucidate the inherent discrepancies between the general- and clinical-domain texts, and suggest that entity-centric contrastive approaches may help bridge this gap and lead to a better representation of clinical sentences.

translated by 谷歌翻译

An Open-Source Gazebo Plugin for GNSS Multipath Signal Emulation in Virtual Urban Canyons

Kartik Anand Pant , Zhanpeng Yang , James M Goppert , Inseok Hwang

分类：机器人

2022-12-08

One of the major errors affecting GNSS signals in urban canyons is GNSS multipath error. In this work, we develop a Gazebo plugin which utilizes a ray tracing technique to account for multipath effects in a virtual urban canyon environment using virtual satellites. This software plugin balances accuracy and computational complexity to run the simulation in real-time for both software-in-the-loop (SITL) and hardware-in-the-loop (HITL) testing. We also construct a 3D virtual environment of Hong Kong and compare the results from our plugin with the GNSS data in the publicly available Urban-Nav dataset, to validate the efficacy of the proposed Gazebo Plugin. The plugin is openly available to all the researchers in the robotics community. https://github.com/kpant14/multipath_sim

translated by 谷歌翻译

Mirror descent of Hopfield model

Hyungjoon Soh , Dongyeob Kim , Juno Hwang , Junghyo Jo

分类：机器学习

2022-11-29

Mirror descent is a gradient descent method that uses a dual space of parametric models. The great idea has been developed in convex optimization, but not yet widely applied in machine learning. In this study, we provide a possible way that the mirror descent can help data-driven parameter initialization of neural networks. We adopt the Hopfield model as a prototype of neural networks, we demonstrate that the mirror descent can train the model more effectively than the usual gradient descent with random parameter initialization.

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译